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ABSTRACT 




In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and 
quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and 
overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the 
clustering is performed in the original image space, and since no features need to be computed, this approach is 
particularly suited for sparse data. The algorithm may also be used in pattern recognition applications. 

1. Introduction 


Boundary detection and surface approximation are important components of intermediate-level vision. 
They are the first step in solving problems such as object recognition and orientation estimation. Recently, it has 
been shown that these problems can be viewed as clustering problems with appropriate distance measures and 
prototypes [1-4]. Dave's Fuzzy C Shells (FCS) algorithm [1] and the Fuzzy Adaptive C-Shells (FACS) algorithm 
[4] have proven to be successful in detecting clusters that can be described by circular arcs, or more generally by 
elliptical shapes. Unfortunately, these algorithms are computationally rather intensive since they involve the solution 
of coupled nonlinear equations for the shell (prototype) parameters. These algorithms also assume that the number of 
clusters are known. To overcome these drawbacks we recently proposed a computationally simpler Fuzzy C 
Spherical Shells (FCSS) algorithm [3] for clustering hyperspherical shells and suggested an efficient algorithm to 
determine the number of clusters when this is not known. We also proposed the Fuzzy C Quadric Shells (FCQS) 
algorithm [2] which can detect more general quadric shapes. One problem with the FCQS algorithm is that it uses 
the algebraic distance, which is highly nonlinear. This results in unsatisfactory performance when the data is not 
very "clean" [4], Finally, none of the algorithms can handle situations in which the clusters include lines/planes and 
there is much noise. To summarize, the existing algorithms to detect quadric shell clusters have one or more of the 
following drawbacks: i) they are computationally expensive, ii) the distance measure used in the objective function 
can yield distorted estimates of prototype parameters if the data is not well behaved, iii) they assume that the number 
of clusters C is known, iv) their formulations do not allow the degenerate case of lines/planes, and v) they are not 
very robust in the presence of noise. In this paper, we address these drawbacks in more detail and propose a new 
algorithm to overcome these drawbacks. 

2. The Fuzzy C Quadrics Algorithm 

Let xj = [jtyj, Xj 2 . . . Jj n ] be a point in the n-dimensional feature space. We may define the algebraic (or 
residual) distance from xj to a prototype /? / that resembles a second-degree curve as : 

dqij = dQ l (Xj,^) = (piiXyi + p,2Xj2 + . . . + PinX jn + Pi(n+l)XjlXj2 + Pi(n+2)Xj\Xfl + ... 

+ Pis x j(n-\) x jn + Pi(s+l) x jl + Pi(s+2) x j2 + - • • + Pi(s+n)Xjn + Pi(s+n+\))^ 

= pJlfljPi = pjMjPi . ( 1 ) 

The prototypes /?,• are represented by the parameter vectors p, = [pi\, pa , .... p,>] T with r = s+n+1 - n ^ n ^ + n +1 
_ (n+i)(w+2) com po nents w hi c h define the equation of the curve. The Mj in (1) are given by 

Mj = q j q 1 j , wit hqj= [xj v Xj 2 , . . ., z? , X jx X j2 , . . ■J- j{nA ?j n *j\>X fl , . - X jn< 1], (2) 

We may now minimize the following objective function which is similar to the objective function used in Fuzzy C- 
Means algorithm (FCM) [6] except for the distance measure. 
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( 3 ) 


iV ^ /V 

j Q (b,u) « z. ■ = J j Z l (Mi j y n pjMjP i , 

where £ = ( /j,...,/? c ), C is the number of clusters, /V is the total number of feature vectors and U = [ ] is the 

CxN fuzzy C-partition matrix satisfying the following conditions. 

C N 

H.j e [0,1] for all i and j, 5^ fi.j = 1 for all j, and 0 < p.. < N for all i. (4) 

Note, Jq(B,U) is homogeneous with respect to pj. Therefore, we need to constrain the problem in order to avoid the 
trivial solution. Some of the possibilities are: 

(i) pn = 1, (ii) pi r = 1, (iii) ljp,ll 2 = 1, and 

2 2 2 

2 2 2 Pi(n+ 1) Pi(n+ 2) Pis , 

(iv) Hp (1 +p i2 + • ■ • +Pin + ~1 _ ^ + 2 L+ ■ ■ ■ + T H m1 ’ (5) 

In [4] Dave et al have also proposed a Fuzzy C Quadrics (FCQ) algorithm using constraint (i). This constraint is 
more restrictive than constraint (iv) used in the FCQS algorithm proposed in [3]. Moreover, the resulting distance 
measure is not invariant to translations and rotations of the prototypes. Constraints (ii) and (iii) are also not suitable 

for the same reason. In other words, these constraints make the distance dj^.. a function of not just the relative 
location of point Xj to curve /J ( , but also the actual location and orientation of the curve /)- in feature space, which is 
undesirable. However, constraint (iv) makes the distance invariant to translations and rotations [5]. Other constraints 
are also possible, and one of them will be discussed in Section 4. With constraint (iv) the minimization of (3) 
reduces to an eigenvector problem, and its implementation is straightforward. Minimization with respect to the 
memberships is similar to the FCM case [6]. It is easy to show that the memberships are updated according to 

' ~r -J ~ if/. = <J> 

Pij = | 0 /«/. if Ij * <J> (6) 


if/. = <D 
J 


Zu .^ 1 


if / *<t> 

J 


where Ij = [i I 1 < i < C, = 0}. The original FCQS algorithm is summarized below. 


THE FUZZY C QUADRIC SHELLS (FCQS) ALGORITHM: 

Fix the number of clusters C; Fix m , 1 < m < 

Set iteration counter / = 1 ; 

Initialize the fuzzy C-partition lA°) using the FCSS algorithm; 

Repeat 

Compute Pj (l ) for each cluster /3, by minimizing (3) subject to (5); 

Update l/d ) using (6); 

Increment / ; 

Until ( II . t/«)||< £ ); 

The FCQS algorithm has the following drawbacks: i) Since the algebraic distance given by (1) is highly 
nonlinear, the membership assignments are not very meaningful, ii) the constraint in (5) strictly speaking does not 
allow us to fit linear (or planar) clusters. We now address these drawbacks in more detail and propose modifications 
of the algorithm to overcome these drawbacks. 

3. The Modified Fuzzy C Quadric Shells Algorithm 

2 

To overcome the problem due to the nongeometric nature of one may use the geometric 
(perpendicular) distance (denoted by ) between the point xj and the shell )3- given by 
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d^ >l j = min ILr j - Zy ll 2 such that 

(Zy- T A, Zy- + Zy T V; + fy) = 0, (7) 

where z,y is a point lying on the quadric curve describing cluster /J, . Using a Lagrange multiplier A, Equation (7) can 
be solved for Zy as 

Zy = ^(I- AA,) 1 (Avj + 2xJ). (8) 


Substituting (8) in (7) yields a quartic (fourth-degree) equation in A in the 2-D case, which has at most four real roots 
A*, \<k<A. They can be easily computed using the standard solution from mathematical tables. For each real root A* 

so computed, we calculate the corresponding (Zy)* using (8). Then, we may compute = min ILcy - (Zy)*ll 2 . 


One can formulate the FCQS algorithm using d py as the underlying distance measure [15]. Minimizing the 

resulting objective function with respect to U yields an equation identical to (6) where is replaced by djjy . 

However, minimizing the objective function with respect to the parameters /»,- results in coupled nonlinear equations 
with no closed-form solution. To overcome this problem, we may assume that we can obtain approximately the same 
solution by minimizing (3) subject to (5), which will be true if all the feature points lie reasonably close to the 
hyperquadric shells. This assumption leads to the Modified FCQS (MFCQS) algorithm. Our experimental results 
show that in the 2-D case the modified FCQS algorithm gives much better results and converges much faster than 
the original version. In fact, our extensive simulations indicate that the performance of this algorithm is excellent, as 
long as the data points are all reasonably close to the curves (i. e„ as long as the data is not highly scattered), which 
will be true in most computer vision applications. This may be attributed to the fact that the membership assignment 
based on the perpendicular distance is more reasonable. 


The MFCQS algorithm can also be used to find linear clusters, even though the constraint in (5) forces all 
prototypes to be of second degree. This is because the algorithm usually fits either two coincident lines (for a single 
line), or an extremely elongated ellipse (for two parallel lines) or a hyperbola (for two crossing lines). It is quite 
simple to recognize these situations from the parameters of the prototypes, and when these situations occur, we can 
simply split such prototypes to a pair of lines after the algorithm converges. 


2 

It is to be noted that d p y has a closed-form solution only in the 2-D case. In higher dimensions, solving for 

is not trivial. For example, in the three dimensional case, this results in a sixth degree equation, which needs to 

be solved iteratively. This makes the algorithm slow. We now propose an alternative formulation of the algorithm to 
overcome this problem. 

4. The Fuzzy C Piano-Quadric Shells Algorithm 


When the exact (geometric) distance has no closed-form solution, one of the methods suggested in the 
literature is to use what is known as the "approximate distance" which is the first-order approximation of the exact 
distance. It is easy to show [7] that the approximate distance of a point from a curve is given by 

4.. = d A 2 (x,fl.) = -^- = f QiJ T , 

A " jP ‘ IV^yl 2 Pi T W T Pi 


(9) 


where is the gradient of the distance functional 

Pi T q = [Pi\>Pi2> ■ ■ -,Pir][x j, *2 V* 1*2- • • •’*(n-l)V*l. x 2 x n< 1 1 T 

evaluated at x: . In (9) the matrix D; is simply the Jacobian of q evaluated at x. , 


( 10 ) 


One can easily reformulate the quadric shell clustering algorithm with d A y as the underlying distance 
measure. The objective function to be minimized in this case becomes 
C N , C N pl M: Pi 

J >> 
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Unfortunately, the minimization of the resulting objective function with respect to pi in general leads to coupled 
nonlinear equations which can only be solved iteratively. To avoid this problem, we choose the constraints 


where 


DjD/^pi = 2 (PiJT, or p?GjPi = Ni , i = 1, . . ., C, 

N N 

Gi= X (PijT Dj D : T and = X 0^)“. 

y=l 7 7 7 7=1 7 


The above constraint has been applied in the hard case by Taubin [8] with good results when there is only one curve 
to be fitted. Our contribution is to extend it to the fuzzy case and to fit C curves simultaneously. Using (12) and 
Lagrange multipliers, we may now minimize 
C N C 


j^iW * *1 - "<) 

C N pj M: pi C r N ^,1 

= i?i jh^p^Dfi^pi ■/? -* ( Hj yn \ 


When most of the data points are close to the prototypes, the memberships Hij will be quite hard (i. e., they will be 
close to 0 or 1). This assumption is also quite good if we use possibilistic memberships [9] to be discussed in Section 
5. This means that when the constraint in (12) is satisfied, we may say that p^EjD: T />; ~ 1. In fact, it is easy to 
show that the condition pp^DjDj T pi = 1 is exactly true for the case of lines/planes and certain quadrics such as 
circles and cylinders. Hence, we will obtain approximately the same solution if we minimize 


C N C r N N -| 

X X (/ lijTpjMjPi - X A, X (Pij) m pi^Dj Dj T Pi - X(ttH 

1=1 7=1 7 7 1=1 L7 = l 7 7 7 7 = 1 7 J 

If we assume that the prototypes are independent of each other, then this is equivalent to independently minimizing 
n f n N i 

.'L (M ij ) m pjM j p i - A ilpPXWPDjDfpi - 'LiPijf 1 

7=1 L 7=1 7 = I J 


= pj FiPi * *« (pJc.Pi -Ni) 


where 


F> = X (Mur Mj - (16) 

y=l 7 

The solution of (16) is given by the generalized eigenvector problem 

FiPi - A/ G iPi , (17) 

which can be converted to the standard eigenvector problem if the matrix G,- is not rank-deficient. Unfortunately this 
is not the case. In fact, the last row of Dj is always [0, . . . ,0]. Equation (17) can still be solved using other 
techniques that use the modified Cholesky decomposition [8], and the solution is computationally quite inexpensive 
when the feature space is 2-D or 3-D. Another advantage of this constraint is that it can also fit lines and planes in 
addition to quadrics. Minimization of (1 1) with respect to the memberships Mij leads to 




if /.= <b 
J 


0 itlj if Ij * <t> 




if / 

1 


In the 2-D case, in the above equation may also be substituted by dp^ . We notice that in practice this gives 

more rapid convergence. The resulting clustering algorithm, which we call the Fuzzy C Piano-Quadric Shells 
algorithm, is summarized below. 
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THE FUZZY C PLANO-QUADRIC SHELLS (FCPQS) ALGORITHM: 

Fix the number of clusters C; fix m , 1 < m < «=; 

Set iteration counter l = 1; 

Initialize the fuzzy C-partition lA°); 

Repeat 

Compute the matrices F, and G/ using (13) and (16) 

Compute pf l ) for each cluster /?, solving ( 1 7) 

Update l /( l ) using (18); 

Increment / ; 

Until ( II t/( M >- lA i )||<£); 

5. Robust Shell Clustering 

The algorithms discussed above will be sensitive to outlier points even when the objective function based 
on the approximate distance is minimized. To overcome this problem, we have converted the algorithm to a 
possibilistic algorithm [9]. This is very easily achieved by updating the memberships according to 



instead of (18). In (23), one attractive choice for rj, in practice is the average fuzzy intra-cluster distance given by 
Vi = # • (24) 

Our experimental results show that the resulting algorithm, which we call the Possibilistic C Piano-Quadric Shells 
(PCPQS) algorithm, is quite robust in the presence of poorly defined boundaries (i. e., when the edge points are 
somewhat scattered around the ideal boundary curve in the 2-D case and when the range values are not very accurate 
in the 3-D case). It is also very immune to impulse noise and outliers, as can be seen in the examples presented in 
Section 7. A possibilistic version of the Modified FCQS algorithm (denoted by MPCQS) was also implemented. 

6. Determination of Number of Clusters 

The number of clusters C is not known a priori in some pattern recognition applications and most 
computer vision applications. When the number of clusters is unknown, one method to determine this number is to 
perform clustering for a range of C values, and pick the C value for which a suitable validity measure is minimized 
(or maximized) [10,12]. However this method is rather tedious, especially when the number of clusters is large. 
Also, in our experiments, we found that the C value obtained this way may not be optimum. This is because when C 
is large, the clustering algorithm sometimes converges to a local minimum of the objective function, and this may 
result in a bad value for the validity of the clustering, even though the value of C is correct. Moreover, when C is 
greater than the optimum number, the algorithm may split a single shell cluster into more than one cluster, and yet 
achieve a good value for the overall validity. To overcome these problems, we propose an alternative Unsupervised 
C Shell Clustering algorithm which is computationally more efficient, since it does not perform the clustering for an 
entire range of C values. 

Our proposed method progressively clusters the data starting with an overspecified number C max of 
clusters. Initially, the FCPQS algorithm is run with C=C max . After the algorithm converges, spurious clusters (with 
low validity) are eliminated; compatible clusters are merged; and points assigned to clusters with good validity are 
temporarily removed from the data set to reduce computations. The FCPQS algorithm is invoked again with the 
remaining feature points. The above procedure is repeated until no more elimination, merging, or removing occurs, 
or until C=l. This algorithm is summarized below. 
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THE UNSUPERVISED POSSIBILISTIC C PLANO-QUADRIC SHELLS (UPCPQS) ALGORITHM: 
Set C = Cmax ; fix m , 1 < m < »; 

CRemoved := 0; MergeFlag := EliminmateFlag := RemoveFlag := TRUE; 

While C > 1 and (MergeFlag = TRUE or EliminmateFlag = TRUE or 
RemoveFlag = TRUE) do 

MergeFlag := EliminmateFlag := RemoveFlag := FALSE; 

Perform the PCPQS algorithm with the number of clusters = C; 

Eliminate spurious clusters using validity, decrement C accordingly, 

and set EliminmateFlag = TRUE if any elimination has occurred; 

Merge compatible prototypes among the C prototypes , update C, 
and set MergeFlag = TRUE if merging has occurred; 

Remove good clusters using validity, update C, and set RemoveFlag = TRUE if 
any good clusters are removed; 

Save the remaining clusters' prototypes ; 

End While 

Replace all the removed feature points back into the data set 

Append the list of remaining clusters' prototypes from the last iteration in the while loop 
to the list of removed clusters' prototypes; 

Do 

Perform the PCPQS algorithm with the new C; 

Merge compatible prototypes in the prototype list and update C ; 

Eliminate tiny clusters and decrement C accordingly; 

Until No more merging or elimination takes place; 


One way to determine if two clusters are compatible (i. e., whether they can be merged), is to estimate the best fit for 
all the points having a membership greater than an a-cut in the two clusters. If the validity for the resulting cluster is 
good, then the two clusters are considered mergeable. The above algorithm also requires a validity measure to 
discriminate between good and bad clusters. Several cluster validity criteria have been presented in the literature. 
For example, performance measures based on the memberships in the partition matrix V have been proposed by 
some researchers [6,10], Unfortunately, these are not very effective for shell clusters, since they do not reflect the 
actual geometric structure of the data set. One possible validity measure we may define is the shell thickness 
measure, which is simply the sum of the squared errors of the fit for the ith cluster given by 
N 

< 19) 


However, it is difficult to estimate a "good" value for this validity measure in noisy conditions. Validity measures 
may also be defined using hypervolume and density [1 1,12]. To do this, the distance vector from a feature point to a 
shell prototype is first defined as 8 y = (x ; - z,), where z- t is the closest point on the curve (or surface) to the feature 
point Xj in the approximate distance sense. The fuzzy spherical shell covariance matrix is defined by 
N 


£. —LzA — 




( 20 ) 


10 o m 

/-I * 

Using (15) the fuzzy shell hypervolume and the shell density may be defined as 


V/= Vdet (Ft), and D,= (21) 

where 5/ is the sum of close members of shell /3, given by 

Si = E,Hij such that sjl - 1 5/ <1. (22) 

However, the above measures are not very reliable because their values can vary widely for good clusters, depending 
on the sizes of the clusters and noise. They can also be "good" for spurious clusters. Therefore, we have developed a 
new validity measure for shell clusters based on the idea of curve (surface) density, which is a measure of the 
number of feature points per unit length (surface area) of the shell cluster. We have also developed methods to 
estimate the effective curve length (surface area) of the shell clusters when the curves (surfaces) are partial. A more 
detailed discussion of this validity measure will be the subject of a future paper. 
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7. Experimental Results 

Although the algorithms presented in the previous sections are applicable to feature spaces of any 
dimension, in this paper we present only results of two-dimensional data sets. In all the examples shown in this 
paper, the UPCPQS algorithm was applied with the fuzzifier m = 2 and C max = 25. To obtain a good initialization 

of the fuzzy C-partition lX°), we run the Gustafson-Kessel algorithm with m = 1 .5 for a few iterations (which gives 
an excellent linear approximation of the data) followed by the Fuzzy C Spherical Shells algorithm [2], This was 
observed to give excellent results. The data sets consists of object edges obtained by applying an edge operator to 
real images. Uniformly distributed noise with an interval of 30 was added to the images to make them noisy. The 
edge images were then thinned [14] to reduce the number of pixels to be processed. The resulting input images 
typically had about 2000 points. The PCPQS algorithm still sometimes fits second-degree curves for linear clusters, 
especially when the data is scattered. Therefore, the algorithm was modified to identify such situations and split such 
clusters into lines after convergence, as explained in Section 3. In practice, there seems to be very little difference 
between the PCPQS and MPCQS algorithms in the 2-D case. 

Figure 1(a) shows the original noisy image of a box with holes. The edge-detected and thinned image is 
shown in Figure 1(b). As can be seen, there are many noise points, and the pixel boundaries are not always well- 
defined. Figure 1(c) shows the result of the UPCPQS algorithm. The final prototypes are shown superimposed on the 
edge image. The prototypes are virtually unaffected by noise and poor boundaries. Figure 1(d) shows the "cleaned" 
edge map. This is obtained by plotting the boundaries generated by the prototypes only in locations where there is at 
least one pixel with a high membership value in a 3x3 neighborhood. Figures 2 and 3 show similar results for 
images with collections of objects of various sizes and shapes. 

8. Summary 

In this paper, we propose a new approach to boundary and surface approximation in computer vision. 
Current techniques to describe boundaries and surfaces in terms of parametrized or algebraic forms have the 
following disadvantages: i) Many techniques apply in cases when the boundaries/surfaces belonging to different 
objects have already been segmented, ii) they look for local structures and use edge following or region growing and 
hence would be sensitive to local aberrations and deviations in shapes , iii) they are computationally intensive and 
the memory requirement are high, iv) they require features (such as curvature and surface normals) to be calculated 
and hence are sensitive to noise and the computed features are inaccurate at boundaries of surfaces, v) most of the 
feature-based techniques assume dense data and hence are not suitable if the data is sparse or if there are gaps in the 
data, and vi) some methods are not invariant to rigid transformations. The approach we propose overcomes these 
drawbacks. If the clustering is performed in the feature space, it can have the disadvantages of high dimensionality, 
and loss of pixel adjacency information. However, since the proposed methods apply clustering techniques directly 
to image data, they do not suffer from these disadvantages. Another disadvantage of clustering methods is that the 
number of clusters has to be known in advance. The proposed approach overcomes this problem by using new 
cluster validity measures and compatible cluster merging. 

Linear and Quadric shapes are not sufficiently general for all computer vision applications. We propose to 
extend our algorithm to more general shells such as those represented by algebraic curves, or superquadrics. 
Currently there are no algorithms that simultaneously fit an unknown number of general curves (or surfaces) to noisy 
and/or scattered data. This includes boundaries and surfaces that are locally very noisy, and boundaries and surfaces 
that are sparsely sampled. Methods based on feature computation and region growing do not work in these cases. 
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